Empirical Analysis on Comparing the Performance of Alpha Miner Algorithm in SQL Query Language and NoSQL Column-Oriented Databases Using Apache Phoenix
نویسندگان
چکیده
Process-Aware Information Systems (PAIS) is an IT system that support business processes and generate large amounts of event logs from the execution of business processes. An event log is represented as a tuple of CaseID, Timestamp, Activity and Actor. Process Mining is a new and emerging field that aims at analyzing the event logs to discover, enhance and improve business processes and check conformance between run time and design time business processes. The large volume of event logs generated are stored in the databases. Relational databases perform well for a certain class of applications. However, there are a certain class of applications for which relational databases are not able to scale. To handle such class of applications, NoSQL database systems emerged. Discovering a process model (workflow model) from event logs is one of the most challenging and important Process Mining task. The α-miner algorithm is one of the first and most widely used Process Discovery technique. Our objective is to investigate which of the databases (Relational or NoSQL) performs better for a Process Discovery application under Process Mining. We implement the α-miner algorithm on relational (row-oriented) and NoSQL (column-oriented) databases in database query languages so that our algorithm is tightly coupled to the database. We present a performance benchmarking and comparison of the α-miner algorithm on row-oriented database and NoSQL column-oriented database so that we can compare which database can efficiently store massive event logs and analyze it in seconds to discover a process model.
منابع مشابه
Graph or Relational Databases: A Speed Comparison for Process Mining Algorithm
Process-Aware Information System (PAIS) are IT systems that manages, supports business processes and generate large event logs from execution of business processes. An event log is represented as a tuple of the form CaseID, TimeStamp, Activity and Actor. Process Mining is an emerging area of research that deals with the study and analysis of business processes based on event logs. Process Minin...
متن کاملComparative Study of Column Oriented NoSQL Databases on Characteristics
NoSQL database, also called Not Only SQL, is an approach to data management and database design that's useful for very large sets of distributed data. The growing popularity of big data will compel many companies to use NoSQL databases instead of traditional database. Generally, there are three main types of NoSQL databases: key-value stores, column oriented databases and document based stores....
متن کاملPerformance Analysis Of Scalable Sql And Nosql Databases : A Quantitative Approach
PERFORMANCE ANALYSIS OF SCALABLE SQL AND NOSQL DATABASES: AQUANTITATIVE APPROACHby HARISH BALASUBRAMANIANMay 2014Advisor: Dr.Weisong ShiMajor: Computer ScienceDegree: Master of Science Benchmarking is a common method in evaluating and choosing a NoSQL database.There are already lots of benchmarking reports available in internet and research papers. Most ofthe ben...
متن کاملPhysical Data Warehouse Design on NoSQL Databases - OLAP Query Processing over HBase
Nowadays, data warehousing and online analytical processing (OLAP) are core technologies in business intelligence and therefore have drawn much interest by researchers in the last decade. However, these technologies have been mainly developed for relational database systems in centralized environments. In other words, these technologies have not been designed to be applied in scalable systems s...
متن کاملThe SQL++ Unifying Semi-structured Query Language, and an Expressiveness Benchmark of SQL-on-Hadoop, NoSQL and NewSQL Databases
SQL-on-Hadoop, NewSQL and NoSQL databases provide semi-structured data models (typically JSON based) and respective query languages. Lack of formal syntax and semantics, idiomatic (nonSQL) language constructs and large variations in syntax, semantics and actual capabilities pose problems even to database experts: It is hard to understand, compare and use these languages. It is especially tediou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1703.05481 شماره
صفحات -
تاریخ انتشار 2017